Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment Correlations
نویسندگان
چکیده
Motivated by the success of speech characteristics representation by color attributes, we analyzed the cross-modal sentiment correlations between voice source characteristics and textural image characteristics. For the analysis, we employed vowel sounds with representative three phonation differences (modal, creaky and breathy) and 36 texture images with 36 semantic attributes (e.g., banded, cracked and scaly) annotated one semantic attribute for each texture. By asking 40 subjects to select the most fitted textures from 36 figures with different textures after listening 30 speech samples with different phonations, we measured the correlations between acoustic parameters showing voice source variations and the parameters of selected textural image differences showing coarseness, contrast, directionality, busyness, complexity and strength. From the texture classifications, voice characteristics can be roughly characterized by textural differences: modalgauzy, banded and smeared, creaky porous, crystalline, cracked and scaly, breathy smeared, freckled and stained. We have also found significant correlations between voice source acoustic parameters and textural parameters. These correlations suggest the possibility of cross-modal mapping between voice source characteristics and textural parameters, which enables visualization of speech information with source variations reflecting human sentiment perception.
منابع مشابه
Cross-modal description of sentiment information embedded in speech
Looking for new possibilities to describe the information embedded in speech, we have carried out sentiment correlation analysis between speech features and color attributes. Using single vowel utterances with different prosody and sound pressure level, we have asked subjects to select colors based on their perceptual impressions after listening them. By analyzing selected color attributes usin...
متن کاملYouTube Movie Reviews: In, Cross, and Open-domain Sentiment Analysis in an Audiovisual Context
In this contribution we focus on the task of automatically analyzing a speaker’s sentiment in on-line videos containing movie reviews. In addition to textual information, we consider adding audio features as typically used in speech-based emotion recognition as well as video features encoding valuable valence information conveyed by the speaker. We combine this multi-modal experimental setup wi...
متن کاملVocal Parameters of Adults with Down Syndrome in Zahedan /Iran
Background & Aims: Previous studies have indicated significant differences in vocal parameters between children with Down syndrome and normal children, but there are challenges about these differences. In this study vocal parameters and Maximum Phonation Time (MPT) in adults with Down syndrome have been investigated. Method: This cross-sectional and analytic study was performed on 22 adults wit...
متن کاملRelationship of Various Open Quotients With Acoustic Property, Phonation Types, Fundamental Frequency, and Intensity.
INTRODUCTION In the present study, we examined the relationship between various open quotients (Oqs) and phonation types, fundamental frequency (F0), and intensity by multivariate linear regression analysis (MVA) to determine which Oq best reflects vocal fold vibratory characteristics. METHODS Using high-speed digital imaging (HSDI), a sustained vowel /e/ at different phonation types, F0s, an...
متن کاملPhonation Contrasts across Languages
This study compares the phonetics of phonation categories within and across four languages: Gujarati (modal, breathy), White Hmong (modal, breathy, creaky), Jalapa Mazatec (modal, breathy, creaky), and Southern Yi (tense, lax). In addition to acoustic measures in all four languages, electroglottographic measures were also compared for Gujarati, Hmong, and Yi. Several measures distinguished phon...
متن کامل